20 research outputs found

    Optimized fixed point implementation of a local stereo matching algorithm onto C66x DSP

    Get PDF
    International audienceStereo matching techniques aim at reconstructing disparity maps from a pair of images. The use of stereo matching techniques in embedded systems is very challenging due to the complexity of the state-of-the-art algorithms. An efficient local stereo matching algorithm has been chosen from the literature and implemented on a c6678 DSP. Arithmetic simplifications such as approximation by piecewise linear functions and fixed point conversions are proposed. Thanks to factorisation and pre-computing, the memory footprint is reduced by a factor 13 to fit on the memory footprint available on embedded systems. A 14.5 fps speed (factor 60 speed-up) has been reached with a small quality loss on the final disparity map

    Intégration rapide de services vidéo Mpeg sur architectures parallèles

    Get PDF
    Le temps réel pour des applications audiovisuelles est une forte contrainte qui nécessite la mise en oeuvre de plates-formes constituées de plusieurs unités de calcul. Le but de nos travaux est de développer un processus de prototypage rapide sur des architectures parallèles pour des applications de traitement d'image. Le processus de prototypage débute par la description des algorithmes grâce à une interface visuelle de programmation orientée objet. Cette description est ensuite transformée automatiquement pour pouvoir être utilisée par Syndex, un logiciel permettant d'évaluer et de générer l'ordonnancement des tâches de l'algorithme sur des architectures multiprocesseurs. Nous démontrons ici l'efficacité de notre méthodologie avec les développement d'une application Mpeg-2 conséquente et son implantation multi-DSP

    How programming models can manage the problem of scaling

    No full text
    International audienc

    Hardware code generation from dataflow programs

    Get PDF
    International audienceThe elaboration of new systems on embedded targets is becoming more and more complex. In particular, multimedia devices are now implemented using mixed hardware and software architecture, which improve the computational power but also increase the design complexity and the time to market. New design flows have been developed to help designers in the development of complex architecture. These design flows are often based on the use of languages with a higher level of abstraction. RVC-CAL is a dataflow programming language which provides the good features in this context. An RVC-CAL dataflow program can be compiled to various target software languages (e.g. C, Java, LLVM) with the Open RVC-CAL Compiler (Orcc). In this paper, we will present a new hardware code generator that generates a high-quality portable VHDL code with hierarchical architecture from a RVC-CAL dataflow program in a matter of seconds. The paper explains the underlying principles of the hardware code generator, and presents the results obtained from an Inverse DCT described as an RVC-CAL dataflow program

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Exploration d'Espace de Conception pour des Techniques de Calculs Approximés pour la Mémoire

    No full text
    International audienceModern digital systems are processing more and more data. This increase in memory requirements must match the processing capabilities and interconnections to avoid the memory wall. Approximate computing techniques exist to alleviate these requirements but usually require a thorough and tedious analysis of the processing pipeline. This paper presents an applicationagnostic Design Space Exploration (DSE) of the buffer-sizing process to reduce the memory footprint of applications while guaranteeing an output quality above a defined threshold. The proposed DSE selects the appropriate bit-width and storage type for buffers to satisfy the constraint. We show in this paper that the proposed DSE reduces the memory footprint of the SqueezeNet CNN by 58.6% with identical Top-1 prediction accuracy, and the full SKA SDP pipeline by 39.7% without degradation, while only testing for a subset of the design space. The proposed DSE is fast enough to be integrated into the design stream of applications

    Hardware Design and Implementation of Adaptive Multiple Transforms for the Versatile Video Coding Standard

    No full text
    International audienceVersatile Video Coding (VVC) is the next generation video coding standard expected by the end of 2020. Several new contributions have been proposed to enhance the coding efficiency beyond the High Efficiency Video Coding (HEVC) standard. One of these tools is the Adaptive Multiple Transform (AMT) as a new approach of the transform core design. The AMT involves five DCT/DST transform types with larger and more flexible partitioning block sizes. However, the AMT coding efficiency comes with the cost of higher computational complexity, especially at the encoder side. In this paper, a efficient pipelined hardware implementation of the AMT including the five types of sizes 4x4, 8x8, 16x16 and 32x32 is proposed. The architecture design takes advantage of the internal software/hardware resources of the target FPGA device such as Library of Parametrized Modules (LPM) core IPs and blue Digital Signal Processing (DSP) blocks. The proposed 1D 32-point AMT design allows to process 4K video at 44 frames per second. A unified 2D implementation of the 4, 8, 16 and 32-point AMT design is also presented.The implementation takes into account all the asymmetric 2D block size combinations from 4 to 32. The 2D architecture design is able to sustain 2K video coding at 50 frames per second with an operational frequency up to 147 Mhz
    corecore